NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought

Man, Yunze; Huang, De-An; Liu, Guilin; Sheng, Shiwei; Liu, Shilong; Gui, Liang-Yan; Kautz, Jan; Wang, Yu-Xiong; Yu, Zhiding (June 2025, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

Free, publicly-accessible full text available June 11, 2026
QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation

Zhao, Yue; Xue, Fuzhao; Reed, Scott; Fan, Linxi; Zhu, Yuke; Kautz, Jan; Yu, Zhiding; Krähenbühl, Philipp; Huang, De-An (February 2025, cs.CV)

We introduce Quantized Language-Image Pretraining (QLIP), a visual tokenization method that combines state-of-the-art reconstruction quality with state-of-the-art zero-shot image understanding. QLIP trains a binary-spherical-quantization-based autoencoder with reconstruction and language-image alignment objectives. We are the first to show that the two objectives do not need to be at odds. We balance the two loss terms dynamically during training and show that a two-stage training pipeline effectively mixes the large-batch requirements of image-language pre-training with the memory bottleneck imposed by the reconstruction objective. We validate the effectiveness of QLIP for multimodal understanding and text-conditioned image generation with a single model. Specifically, QLIP serves as a drop-in replacement for the visual encoder for LLaVA and the image tokenizer for LlamaGen with comparable or even better performance. Finally, we demonstrate that QLIP enables a unified mixed-modality auto-regressive model for understanding and generation.
more » « less
Free, publicly-accessible full text available February 7, 2026
EAGLE : Exploring the Design Space for Multi-modal LLMs with Mixture of Encoders

Shi, Min; Liu, Fuxiao; Wang, Shihao; Liao, Shijia; Radhakrishnan, Subhashree; Zhao, Yilin; Huang, De-an; Yin, Hongxu; Sapra, Karan; Yccoob, Yaser; et al (April 2025, ICLR 2025)

The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs). Recent work indicates that enhanced visual perception significantly reduces hallucinations and improves performance on resolution-sensitive tasks, such as optical character recognition and document analysis. A number of recent MLLMs achieve this goal using a mixture of vision encoders. Despite their success, there is a lack of systematic comparisons and detailed ablation studies addressing critical aspects, such as expert selection and the integration of multiple vision experts. This study provides an extensive exploration of the design space for MLLMs using a mixture of vision encoders and resolutions. Our findings reveal several underlying principles common to various existing strategies, leading to a streamlined yet effective design approach. We discover that simply concatenating visual tokens from a set of complementary vision encoders is as effective as more complex mixing architectures or strategies. We additionally introduce Pre-Alignment to bridge the gap between vision-focused encoders and language tokens, enhancing model coherence. The resulting family of MLLMs, Eagle, surpasses other leading open-source models on major MLLM benchmarks.
more » « less
Free, publicly-accessible full text available April 24, 2026
ARDuP: Active Region Video Diffusion for Universal Policies

https://doi.org/10.1109/IROS58592.2024.10802264

Huang, Shuaiyi; Levy, Mara; Jiang, Zhenyu; Anandkumar, Anima; Zhu, Yuke; Fan, Linxi; Huang, De-An; Shrivastava, Abhinav (October 2024, IEEE)

Full Text Available
PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees

Xie, Chulin; Huang, De-An; Chu, Wenda; Xu, Daguang; Xiao, Chaowei; Li, Bo; Anandkumar, Anima (June 2024, Computer Vision and Pattern Recognition Conference (CVPR 2024))

Personalized Federated Learning (pFL) has emerged as a promising solution to tackle data heterogeneity across clients in FL. However, existing pFL methods either (1) introduce high computation and communication costs or (2) overfit to local data, which can be limited in scope and vulnerable to evolved test samples with natural distribution shifts. In this paper, we propose PERADA, a parameter-efficient pFL framework that reduces communication and computational costs and exhibits superior generalization performance, especially under test-time distribution shifts. PERADA reduces the costs by leveraging the power of pretrained models and only updates and communicates a small number of additional parameters from adapters. PERADA achieves high generalization by regularizing each client’s personalized adapter with a global adapter, while the global adapter uses knowledge distillation to aggregate generalized information from all clients. Theoretically, we provide generalization bounds of PERADA, and we prove its convergence to stationary points under non-convex settings. Empirically, PERADA demonstrates higher personalized performance (+4.85% on CheXpert) and enables better out-of-distribution generalization (+5.23% on CIFAR-10-C) on different datasets across natural and medical domains compared with baselines, while only updating 12.6% of parameters per model. Our code is available at https://github.com/NVlabs/PerAda.
more » « less
Full Text Available
Potential Singularity Formation of Incompressible Axisymmetric Euler Equations with Degenerate Viscosity Coefficients

https://doi.org/10.1137/22M1470906

Hou, Thomas Y.; Huang, De (March 2023, Multiscale Modeling & Simulation)

Full Text Available
Asymptotically self-similar blowup of the Hou-Luo model for the 3D Euler equations

https://doi.org/10.1007/s40818-022-00140-7

Chen, Jiajie; Hou, Thomas Y.; Huang, De (December 2022, Annals of PDE)

Full Text Available
A potential two-scale traveling wave singularity for 3D incompressible Euler equations

https://doi.org/10.1016/j.physd.2022.133257

Hou, Thomas Y.; Huang, De (July 2022, Physica D: Nonlinear Phenomena)

Full Text Available
Matrix Concentration for Products

https://doi.org/10.1007/s10208-021-09533-9

Huang, De; Niles-Weed, Jonathan; Tropp, Joel A.; Ward, Rachel (August 2021, Foundations of Computational Mathematics)

Full Text Available
On the Finite Time Blowup of the De Gregorio Model for the 3D Euler Equations

https://doi.org/http://arxiv.org/abs/1905.06387

Chen, Jiajie; Hou, Thomas Y; Huang, De (April 2021, Communications on pure and applied mathematics)
null (Ed.)
We present a novel method of analysis and prove finite time asymptotically self- similar blowup of the De Gregorio model [13,14] for some smooth initial data on the real line with compact support. We also prove self-similar blowup results for the generalized De Gregorio model [41] for the entire range of parameter on R or $S^1$ for Holder continuous initial data with compact support. Our strategy is to reformulate the problem of proving finite time asymptotically self-similar singularity into the problem of establishing the nonlinear stability of an approximate self-similar profile with a small residual error using the dynamic rescaling equation. We use the energy method with appropriate singular weight functions to extract the damping effect from the linearized operator around the approximate self-similar profile and take into account cancellation among various nonlocal terms to establish stability analysis. We remark that our analysis does not rule out the possibility that the original De Gregorio model is well posed for smooth initial data on a circle. The method of analysis presented in this paper provides a promising new framework to analyze finite time singularity of nonlinear nonlocal systems of partial differential equations
more » « less
Full Text Available

« Prev Next »

Search for: All records